Goto

Collaborating Authors

 Campina Grande


LLM Company Policies and Policy Implications in Software Organizations

Khojah, Ranim, Mohamad, Mazen, Erlenhov, Linda, Neto, Francisco Gomes de Oliveira, Leitner, Philipp

arXiv.org Artificial Intelligence

Abstract--The risks associated with adopting large language model (LLM) chatbots in software organizations highlight the need for clear policies. We examine how 11 companies create these policies and the factors that influence them, aiming to help managers safely integrate chatbots into development workflows. In software organizations, the software product is gradually evolving to AI-powered software (AIware) with the use of AI, more specifically, large language models (LLMs) in the development process [2]. LLMs are increasingly seen as valuable tools for improving productivity, which motivated enterprises to adopt them [3]. However, these models have introduced risks and concerns that impact the organization, the software engineers, and the product. Integrating LLMs into software development raises challenges related to the quality and ownership of generated content [4], which complicates accountability and can affect product reliability . In addition, interactions with LLMs (e.g., through external APIs) may expose organizations to liability where developers unintentionally transmit sensitive data, resulting in legal repercussions [5].


ProTrix: Building Models for Planning and Reasoning over Tables with Sentence Context

Wu, Zirui, Feng, Yansong

arXiv.org Artificial Intelligence

Tables play a crucial role in conveying information in various domains. We propose a Plan-then-Reason framework to answer different types of user queries over tables with sentence context. The framework first plans the reasoning paths over the context, then assigns each step to program-based or textual reasoning to reach the final answer. This framework enhances the table reasoning abilities for both in-context learning and fine-tuning methods. GPT-3.5-Turbo following Plan-then-Reason framework surpasses other prompting baselines without self-consistency while using less API calls and in-context demonstrations. We also construct an instruction tuning set TrixInstruct to evaluate the effectiveness of fine-tuning with this framework. We present ProTrix model family by finetuning models on TrixInstruct. Our experiments show that ProTrix family generalizes to diverse unseen tabular tasks with only 6k training instances. We further demonstrate that ProTrix can generate accurate and faithful explanations to answer complex free-form questions. Our work underscores the importance of the planning and reasoning abilities towards a model over tabular tasks with generalizability and interpretability. We open-source our dataset and models at https://github.com/WilliamZR/ProTrix.


User Story Tutor (UST) to Support Agile Software Developers

Neo, Giseldo da Silva, Moura, José Antão Beltrão, de Almeida, Hyggo Oliveira, Neo, Alana Viana Borges da Silva, Júnior, Olival de Gusmão Freitas

arXiv.org Artificial Intelligence

User Stories record what must be built in projects that use agile practices. User Stories serve both to estimate effort, generally measured in Story Points, and to plan what should be done in a Sprint. Therefore, it is essential to train software engineers on how to create simple, easily readable, and comprehensive User Stories. For that reason, we designed, implemented, applied, and evaluated a web application called User Story Tutor (UST). UST checks the description of a given User Story for readability, and if needed, recommends appropriate practices for improvement. UST also estimates a User Story effort in Story Points using Machine Learning techniques. As such UST may support the continuing education of agile development teams when writing and reviewing User Stories. UST's ease of use was evaluated by 40 agile practitioners according to the Technology Acceptance Model (TAM) and AttrakDiff. The TAM evaluation averages were good in almost all considered variables. Application of the AttrakDiff evaluation framework produced similar good results. Apparently, UST can be used with good reliability. Applying UST to assist in the construction of User Stories is a viable technique that, at the very least, can be used by agile developments to complement and enhance current User Story creation.


Evaluating ChatGPT-4 Vision on Brazil's National Undergraduate Computer Science Exam

Mendonça, Nabor C.

arXiv.org Artificial Intelligence

The recent integration of visual capabilities into Large Language Models (LLMs) has the potential to play a pivotal role in science and technology education, where visual elements such as diagrams, charts, and tables are commonly used to improve the learning experience. This study investigates the performance of ChatGPT-4 Vision, OpenAI's most advanced visual model at the time the study was conducted, on the Bachelor in Computer Science section of Brazil's 2021 National Undergraduate Exam (ENADE). By presenting the model with the exam's open and multiple-choice questions in their original image format and allowing for reassessment in response to differing answer keys, we were able to evaluate the model's reasoning and self-reflecting capabilities in a large-scale academic assessment involving textual and visual content. ChatGPT-4 Vision significantly outperformed the average exam participant, positioning itself within the top 10 best score percentile. While it excelled in questions that incorporated visual elements, it also encountered challenges with question interpretation, logical reasoning, and visual acuity. The involvement of an independent expert panel to review cases of disagreement between the model and the answer key revealed some poorly constructed questions containing vague or ambiguous statements, calling attention to the critical need for improved question design in future exams. Our findings suggest that while ChatGPT-4 Vision shows promise in multimodal academic evaluations, human oversight remains crucial for verifying the model's accuracy and ensuring the fairness of high-stakes educational exams. The paper's research materials are publicly available at https://github.com/nabormendonca/gpt-4v-enade-cs-2021.


Deepfake audio as a data augmentation technique for training automatic speech to text transcription models

Ferreira, Alexandre R., Campelo, Cláudio E. C.

arXiv.org Artificial Intelligence

To train transcriptor models that produce robust results, a large and diverse labeled dataset is required. Finding such data with the necessary characteristics is a challenging task, especially for languages less popular than English. Moreover, producing such data requires significant effort and often money. Therefore, a strategy to mitigate this problem is the use of data augmentation techniques. In this work, we propose a framework that approaches data augmentation based on deepfake audio. To validate the produced framework, experiments were conducted using existing deepfake and transcription models. A voice cloner and a dataset produced by Indians (in English) were selected, ensuring the presence of a single accent in the dataset. Subsequently, the augmented data was used to train speech to text models in various scenarios.


Machine Learning Simulates Agent-Based Model Towards Policy

Furtado, Bernardo Alves, Andreão, Gustavo Onofre

arXiv.org Artificial Intelligence

Public Policies are not intrinsically positive or negative. Rather, policies provide varying levels of effects across different recipients. Methodologically, computational modeling enables the application of multiple influences on empirical data, thus allowing for heterogeneous response to policies. We use a random forest machine learning algorithm to emulate an agent-based model (ABM) and evaluate competing policies across 46 Metropolitan Regions (MRs) in Brazil. In doing so, we use input parameters and output indicators of 11,076 actual simulation runs and one million emulated runs. As a result, we obtain the optimal (and non-optimal) performance of each region over the policies. Optimum is defined as a combination of GDP production and the Gini coefficient inequality indicator for the full ensemble of Metropolitan Regions. Results suggest that MRs already have embedded structures that favor optimal or non-optimal results, but they also illustrate which policy is more beneficial to each place. In addition to providing MR-specific policies' results, the use of machine learning to simulate an ABM reduces the computational burden, whereas allowing for a much larger variation among model parameters. The coherence of results within the context of larger uncertainty--vis-\`a-vis those of the original ABM--reinforces robustness of the model. At the same time the exercise indicates which parameters should policymakers intervene on, in order to work towards precise policy optimal instruments.


Privacy-Aware Recommender Systems Challenge on Twitter's Home Timeline

Belli, Luca, Ktena, Sofia Ira, Tejani, Alykhan, Lung-Yut-Fon, Alexandre, Portman, Frank, Zhu, Xiao, Xie, Yuanpu, Gupta, Akshay, Bronstein, Michael, Delić, Amra, Sottocornola, Gabriele, Anelli, Walter, Andrade, Nazareno, Smith, Jessie, Shi, Wenzhe

arXiv.org Machine Learning

Recommender systems constitute the core engine of most social network platforms nowadays, aiming to maximize user satisfaction along with other key business objectives. Twitter is no exception. Despite the fact that Twitter data has been extensively used to understand socioeconomic and political phenomena and user behaviour, the implicit feedback provided by users on Tweets through their engagements on the Home Timeline has only been explored to a limited extent. At the same time, there is a lack of large-scale public social network datasets that would enable the scientific community to both benchmark and build more powerful and comprehensive models that tailor content to user interests. By releasing an original dataset of 160 million Tweets along with engagement information, Twitter aims to address exactly that. During this release, special attention is drawn on maintaining compliance with existing privacy laws. Apart from user privacy, this paper touches on the key challenges faced by researchers and professionals striving to predict user engagements. It further describes the key aspects of the RecSys 2020 Challenge that was organized by ACM RecSys in partnership with Twitter using this dataset.


A Unified Framework for Structured Graph Learning via Spectral Constraints

Kumar, Sandeep, Ying, Jiaxi, Cardoso, José Vinícius de M., Palomar, Daniel

arXiv.org Machine Learning

Graph learning from data represents a canonical problem that has received substantial attention in the literature. However, insufficient work has been done in incorporating prior structural knowledge onto the learning of underlying graphical models from data. Learning a graph with a specific structure is essential for interpretability and identification of the relationships among data. Useful structured graphs include the multi-component graph, bipartite graph, connected graph, sparse graph, and regular graph. In general, structured graph learning is an NP-hard combinatorial problem, therefore, designing a general tractable optimization method is extremely challenging. In this paper, we introduce a unified graph learning framework lying at the integration of Gaussian graphical models and spectral graph theory. To impose a particular structure on a graph, we first show how to formulate the combinatorial constraints as an analytical property of the graph matrix. Then we develop an optimization framework that leverages graph learning with specific structures via spectral constraints on graph matrices. The proposed algorithms are provably convergent, computationally efficient, and practically amenable for numerous graph-based tasks. Extensive numerical experiments with both synthetic and real data sets illustrate the effectiveness of the proposed algorithms. The code for all the simulations is made available as an open source repository.


Grand Challenge: Real-time Destination and ETA Prediction for Maritime Traffic

Bodunov, Oleh, Schmidt, Florian, Martin, André, Brito, Andrey, Fetzer, Christof

arXiv.org Machine Learning

The challenge asks to provide a prediction for (i) a destination and the (ii) arrival time of ships in a streaming-fashion using Geo-spatial data in the maritime context. Novel aspects of our approach include the use of ensemble learning based on Random Forest, Gradient Boosting Decision Trees (GBDT), XGBoost Trees and Extremely Randomized Trees (ERT) in order to provide a prediction for a destination while for the arrival time, we propose the use of Feed-forward Neural Networks. In our evaluation, we were able to achieve an accuracy of 97% for the port destination classification problem and 90% (in mins) for the ETA prediction.